GPU的诞生是一次根本性的突破,其驱动力是 “实时性要求”:在$1/60^{th}$秒(16.67毫秒)的时间窗口内渲染复杂3D场景的不可妥协要求。尽管CPU遵循了 多核发展路径 以低延迟串行执行优化的路径,但随着分辨率提升,它们遇到了瓶颈。
1. 16.67毫秒的约束
20世纪90年代中期,游戏行业陷入危机。一个串行处理的CPU,在处理人工智能和物理运算时,无法快速计算出数百万个像素值以维持流畅画面。这迫使人们必须开发专用硬件来卸载重复性的 图形流水线。
2. 扫描线交错(SLI)
在内部并行数组出现之前,3dfx公司推出了 扫描线交错(SLI)。通过使用两块独立显卡交替计算水平扫描线,整个行业将关注点从单线程速度转向了纯粹的“暴力算力”吞吐量。
3. 吞吐量与延迟
GPU的诞生优先考虑为简单的算术单元分配硅面积,而非复杂的分支预测。这种“宽而慢”的设计理念,使GPU能够处理三角形计算中的重复性数学任务,而CPU则专注于非并行逻辑。
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
What is the specific 'time budget' required for 60 frames per second (FPS)?
33.33ms
16.67ms
10.00ms
100.00ms
✅ Correct!
Correct! $1000ms / 60 = 16.67ms$. This is the window a GPU has to complete all calculations.❌ Incorrect
Think of 1 second (1000ms) divided by 60 frames.QUESTION 2
How did 3dfx's SLI achieve early parallelism in consumer hardware?
By increasing the clock speed of a single chip.
By having two cards render alternating horizontal scan lines.
By sharing AI logic between the GPU and CPU.
By reducing the resolution of the frame.
✅ Correct!
SLI (Scan Line Interleave) was the first major 'brute force' scaling method for graphics.❌ Incorrect
It didn't change speed or resolution; it distributed the workload across two physical devices.QUESTION 3
Why did the GPU diverge from the standard multicore trajectory of CPUs?
GPUs needed deeper caches for complex branching.
GPUs prioritize throughput of simple math over low-latency serial logic.
CPUs became too expensive to manufacture for 3D graphics.
GPU architectures were designed to be smaller than CPUs.
✅ Correct!
CPUs focus on minimizing latency for single tasks; GPUs focus on maximizing the total number of tasks completed simultaneously (throughput).❌ Incorrect
Review the 'wide and slow' philosophy vs the CPU's complex control logic.QUESTION 4
In the context of 1990s gaming, what was the 'Real-Time Imperative'?
The requirement to run physics simulations on the GPU.
Processing millions of pixels within the strict frame window.
The transition from 16-bit to 32-bit computing.
Allowing the CPU to handle rasterization.
✅ Correct!
Without a dedicated accelerator, a CPU could not finish drawing a frame before the display refreshed, causing lag.❌ Incorrect
It refers to the strict temporal deadline of display refresh rates.QUESTION 5
What is meant by the GPU's 'Wide and Slow' philosophy?
Using many simple processors at lower clock speeds to do massive work.
Designing physically wide chips that take longer to process data.
A design that favors high latency but high memory capacity.
Optimizing for single-threaded serial logic.
✅ Correct!
GPUs use thousands of small cores. While one core is slow, the collective throughput is massive.❌ Incorrect
It refers to parallel width (many cores), not physical dimensions or slow overall performance.Case Study: The Quake Revolution
The transition from Software to Hardware Rasterization
In 1996, 'Quake' was a technical marvel. Running it on a CPU (Software Rendering) meant the processor handled AI, sound, networking, and every pixel's math. The Voodoo1 changed this by introducing a dedicated graphics pipeline.
Q
1. Why did offloading the graphics pipeline significantly improve 'Quake's' frame rate even if the CPU wasn't upgraded?
Solution:
Offloading removed the 'dumb math' of pixel rasterization from the CPU, freeing its cycles to focus exclusively on game logic and physics while the specialized GPU hardware calculated pixel values in parallel.
Offloading removed the 'dumb math' of pixel rasterization from the CPU, freeing its cycles to focus exclusively on game logic and physics while the specialized GPU hardware calculated pixel values in parallel.
Q
2. How does the 'Real-Time Imperative' explain the visual stuttering observed on CPUs when resolution was increased?
Solution:
As resolution increases, the number of pixels grows. Since a CPU processes pixels serially, the total time required eventually exceeds the 16.67ms frame budget, causing the system to wait and drop frames.
As resolution increases, the number of pixels grows. Since a CPU processes pixels serially, the total time required eventually exceeds the 16.67ms frame budget, causing the system to wait and drop frames.